FastEx: Hash Clustering with Exponential Families

نویسندگان

  • Amr Ahmed
  • Sujith Ravi
  • Shravan M. Narayanamurthy
  • Alexander J. Smola
چکیده

Clustering is a key component in any data analysis toolbox. Despite its importance, scalable algorithms often eschew rich statistical models in favor of simpler descriptions such as k-means clustering. In this paper we present a sampler, capable of estimating mixtures of exponential families. At its heart lies a novel proposal distribution using random projections to achieve high throughput in generating proposals, which is crucial for clustering models with large numbers of clusters.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Universal Hash Functions from Exponential Sums over Finite Fields and Galois Rings

In t#liis 1)apcr ncw families o f stmngly universal hash funct,ions, or equivalently, authentication codes, are proposed. Their parameters are derived from bounds on exponential sums over finite fields and Galois rings. This is the first tirnr hash families based upon such exponential sums have 1)een considered. Thi>ir performance improves the previously best known c.oiist,ructions and they rai...

متن کامل

Simplification and hierarchical representations of mixtures of exponential families

A mixture model in statistics is a powerful framework commonly used to estimate the probability measure function of a random variable. Most algorithms handling mixture models were originally specifically designed for processing mixtures of Gaussians. However, other distributions such as Poisson, multinomial, Gamma/Beta have gained interest in signal processing in the past decades. These common ...

متن کامل

Mean shift algorithm for exponential families with applications to speaker clustering

This work extends the mean shift algorithm from the observation space to the manifolds of parametric models that are formed by exponential families. We show how the Kullback-Leibler divergence and its dual define the corresponding affine connection and propose a method for incorporating the uncertainty in estimating the parameters. Experiments are carried out for the problem of speaker clusteri...

متن کامل

Recursive n-gram hashing is pairwise independent, at best

Many applications use sequences of n consecutive symbols (n-grams). Hashing these n-grams can be a performance bottleneck. For more speed, recursive hash families compute hash values by updating previous values. We prove that recursive hash families cannot be more than pairwise independent. While hashing by irreducible polynomials is pairwise independent, our implementations either run in time ...

متن کامل

Agglomerative Bregman Clustering

This manuscript develops the theory of agglomerative clustering with Bregman divergences. Geometric smoothing techniques are developed to deal with degenerate clusters. To allow for cluster models based on exponential families with overcomplete representations, Bregman divergences are developed for nondifferentiable convex functions.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012